NNMF-Based Techniques for High-Accuracy Privacy Protection on Non-negative-valued Datasets
نویسندگان
چکیده
1 Abstract— Powerful modern access to a huge amount of various data having high or low level of privacy brings out a concurrent increasing demand for preserving data privacy. The challenge is how to protect attribute values without jeopardizing the similarity between data objects under analysis. In this paper, we further our previous work on applying matrix techniques to protect privacy and present a novel algebraic technique based on iterative methods for non-negative-valued data distortion. As an unsupervised learning method for uncovering latent features in high-dimensional data, a low rank nonnegative matrix factorization (NNMF) is used to preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. It is the first in privacy preserving data mining in our paper that combining non-negative matrix decomposition with distortion processing. Two iterative methods to solve bound-constrained optimization problem in NMF are compared by experiments on Wisconsin Breast Cancer Dataset. The overall performance of NMF on distortion level and data utility is compared to our previously-proposed SVD-based distortion strategies and other existing popular data perturbation methods. Data utility is examined by cross validation of a binary classification using the support vector machine. Our experimental results on data mining benchmark datasets indicate that, in comparison with standard data distortion techniques, the proposed NMF-based method are very efficient in balancing data privacy and data utility, and it affords a feasible solution with a good promise on high-accuracy privacy preserving data mining.
منابع مشابه
Iterative Matrix Factorization Techniques for High-Accuracy Privacy Protection on Non-negative-valued Datasets
1 Abstract— Powerful modern access to huge amounts of various data having high or low level of privacy brings out a concurrent increasing demand for preserving data privacy. The challenge is how to protect attribute values without jeopardizing the similarity between data objects under analysis. In this paper, we further our previous work on applying matrix decomposition techniques to protect pr...
متن کاملLiterature Survey: Non-Negative Matrix Factorization
This article surveys recent research on Non-Negative Matrix Factorization (NNMF), a relatively new technique for dimensionality reduction. It is based on the idea that in many data-processing tasks, negative numbers are physically meaningless. The NNMF technique addresses this problem by placing non-negativity constraints on the data model. I discuss the applications of NNMF, the algorithms and...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملGain Ratio Based Feature Selection Method for Privacy Preservation
Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional dataset...
متن کاملNon-Negative Matrix Factorization and Term Structure of Interest Rates
Non-Negative Matrix Factorization (NNMF) is a technique for dimensionality reduction with a wide variety of applications from text mining to identification of concentrations in chemistry. NNMF deals with non-negative data and results in non-negative factors and factor loadings. Consequently, it is a natural choice when studying the term structure of interest rates. In this paper, NNMF is applie...
متن کامل